Lecture 1

Probability basics

Sample space

Definition: the set of all possible outcomes of an experiment of interest

Examples:

Event space

Event Space: all possible events (collection of outcomes) we will consider.

For discrete sample spaces event space is typically all possible subsets of Ω

We require that union of events and intersection of events are also events:

(1)A,BFAB and ABF

F is closed under unions and intersections

E.g. A=First semester={Jan,Feb,Mar,Apr,May,Jun}

B={May,Jun,July}

A or B=AB={Jan,Feb,Mar,Apr,May,Jun,Jul}

A and B=AB={May,Jun}

Complements

(2)not A=Ac={Jul,Aug,Sep,Oct,Nov,Dec}
(3)AB=AB={Jan,Feb,Mar}
(4)Ac=ΩA=ΩA

Z is closed under union, intersection, and set difference

Venn Diagrams:

Screenshot 2025-11-28 at 12.17.24 AM

Disjoint events

(5)A and B=i.e., no elements in common

E.g.

DeMorgan's Laws

(6)(AB)c=AcBc
(7)(AB)c=AcBc

Example: Let J be the event "John is guilty" and M the event "Mary is guilty.”

(prove DeMorgan's Laws to practice with set operations)

Probability functions

A probability function is a 'set function' that assigns a real number to each event in Z:

(8)P:ZR such that:
  1.  

(9)P(A)0

2.

(10)P(Ω)=1
  1.  

(11)P(AB)=P(A)+P(B) if AB=(i.e. additive on disjoint sets)

The probability reflects the chances an event occurs, 0 being impossible and 1 being certain

Example: Fair coin

(12)P({H})=P({T})=12(we will simplify notation as P(H)=P(T))
(13)P()=0
(14)P({H,T})=P(Ω)=1

Example: birth month of a randomly chosen person

(15)P(Jan)=P(Feb)==P(Dec)=112

Or perhaps a more reasonable assignment of probabilities would be proportional to the number of days in each month:

(16)P(Jan)=31365,P(Feb)=28365,

(This shows that it is us, the users who assign probabilities; probabilities are not 'laws of nature')


Any probability in a discrete sample space can be constructed like in the previous examples:

(17)Ω={ω1,ω2,,ωn}
(18)p1+p2++pn=1,p10,p20,pn0
(19)P(A)=wiApi,AΩ

is a probability function.

Exercise: show this


Property 3 holds for any number of disjoint events:

(20)P(ABC)=P(A)+P(B)+P(C),if AB=,AC=,BC=

E.g. the sets:

Are pairwise disjoint

(21)P(ABC)=P({Jan,Feb,Mar,Apr,May,Jun,Nov})=712
(22)P(A)=P(B)=312,P(C)=112

In general:

(23)P(A1...An)=P(i=1nAi)=i=1nP(Ai)

Provided the events are pairwise disjoint:

(24)AkAl=,k,l{1...n},kl

Derived properties

If A,BZ

(25)P()=0
(26)0P(A)1
(27)P(Ac)=1P(A)
(28)P(AB)=P(A)P(AB)
(29)BAP(AB)=P(A)P(B)
(30)P(AB)=P(A)+P(B)P(AB)

(Good practice exercise to show these)

Repeated experiments/Product of sample spaces

E.g. Flip a coin twice:

(31)Ω={(H,H),(H,T),(T,H),(T,T)}={H,T}×{H,T}={H,T}2

E.g. Flip a coin n times:

(32)Ω={H,T}nall n-tuples with elements in H,T

E.g. Flip a coin and then pick a month at random:

(33)Ω1={H,T},Ω2={Jan,Feb,Mar,,Dec}
(34)Ω=Ω1×Ω2={(ω1,ω2),ω1Ω1,ω2Ω2}

Q: How many elements in Ω?

Repeated experiments/Product of sample spaces

If we have a probability function P1 defined in Ω1 and a probability P2 defined in Ω2 we can naturally defined a probability P in Ω1×Ω2 as:

(35)P({ω1,ω2})=P1(ω1)P2(ω2)

E.g.

(36)P({H,Jul})=P(H)×P(Jul)=12×112

(This is how we model independence, which will cover next week)

Uniform probability spaces

In many applications it makes sense to assign the same probability to all elements of a finite sample space

E.g. two coin flip:

(37)Ω={(H,H),(H,T),(T,H),(T,T)}={H,T}×{H,T}
(38)P(H,H)=P(H,T)=P(T,H)=P(T,T)=14

In general, a uniform probability space with |Ω|=n, has:

(39)P(ω)=1nωΩ

And the probability of an event is the number of elements in the event divided by the total number of elements in the sample space:

(40)P(A)=|A||Ω|

E.g. Pick a single card from a well shuffled standard 52-card deck:

(41)P(Ace)=452
(42)P(diamond suit)=P()=1352=14

Multiplicative counting principle

Many problems in probability theory require that we count the number of ways that a particular event can occur. This kind of counting falls under the area of mathematics called combinatorics.

The Multiplicative counting principle (MP).

Suppose that we perform r experiments such that the kth experiment has nk possible outcomes, for k=1,2,,r. Then there are a total of:

(43)n1×n2×n3××nr

possible outcomes for the sequence of r experiments.

Example 1: Need to choose a password for an online account. Password must consist of two lowercase letters (a to z) followed by one capital letter (A to Z) followed by four digits (0,1, \dots, 9).

Example 2: How many subsets does a set with n elements have?

Permutations

How many five-card hands are possible from a standard fifty-two card deck? (if order matters)

(44)52×51×50×49×48=52!47!=311,875,200 by the MP

In general, a k-permutation of n distinct objects is a way to arrange k objects out of the n in a row (order matters).

The number of k-permutations, p(n,k), is given by:

(45)p(n,k)=n!(nk)!

n-permutations are often refer to as just permutations. There are:

(46)n!(nn)!=n! permutations

Combinations

How many five-card hands are possible from a standard fifty-two card deck? (if order does not matter)

(47)52×51×50×49×48=52!47!ordered arrangements

Each arrangement was counted 5! times so the number of unordered arrangements is:

(48)52!47!5!=2,598,960

In general, a k-combination of n distinct objects is a way to arrange k objects out of the n when order does not matter.

The number of k-combinations, c(n,k), is given by:

(49)c(n,k)=n!(nk)!k!=(nk)
(50)p(n,k)=c(n,k)×k!

Card problem

Suppose we deal a 5-card hand from a regular 52-card deck. Which is larger, P(One king) or P(Two hearts)?

(51)P(One king)=(41)×(484)(525)
(52)P(Two hearts)=(132)×(393)(525)

De Mere's problem

Simulating De Mere's problem

Single die roll in R:

Four rolls of one die

Simulating De Mere's problem

Checking if a six came up

Full simulation:

De Mere's problem

Questions:

  1. Based on this simulation result, do you think the bet's favorable?

  2. Derive/compute the actual probability (hint: use that all outcomes of the four rolls of a die are equally likely)

  3. Simulate the second scheme (24 rolls of two dice). What can you say about the favorability of the bet?

  4. Derive/compute the actual probability. How about for 25 rolls of two dice?

Birthday 'paradox'

How many people n do we need to have in a room to make it a favorable bet (probability of success greater than 1/2) that two people in the room will have the same birthday?

Assume all 365 b-days are equally likely.

  1. Perform a simulation in R to answer this question (hint: use the base R function 'duplicate' to check that whether there are matching b-days)

  2. Compute the probability by mathematical derivation and plot the probability as a function of n. (Hint: use the multiplication principle to count)

Infinite (countable) sample spaces

E.g. flip a coin until first heads appears:

What is the right probability space for this experiment?

(53)Ω={1,2,3,}=N

Sample space has to be infinite because no guarantee experiment will terminate in a finite number of steps!

If we assume that after k flips all k-tuples are equally likely what should the probability P(k) be?

(54)k=1={H}P(1)=12
(55)k=2={T,H}P(2)=14
(56)
(57)k={T,,Tk-1 times,H}P(k)=12k

Does this result in a probability function?

Infinite (countable) sample spaces

For infinite sample spaces need to change additivity rule to countably additivity rule:

2'.

(58)P(k=1Ak)=k=1P(Ak)provided they are disjoint (AkAl=k,lN,kl)
(59)Ω={H,T}all infinite sequences with elements in {H,T}

Verification of P(Ω)=1:

(60)P(Ω)=P({1,2,3,})=P(1)+P(2)+P(3)+=k=112k=1

(Used that for a geometric series:

(61)1+r+r2+=k=0rk=11rif 0r<1

)

Q: What's the probability that it'll take an even number of tosses until the first heads?

Finite and countable sets: a mathematical aside

A set is finite if its elements can be put in one-to-one correspondence with with:

(62){1,2,,n}for some nN

E.g., the set of students in the classroom, the set of inhabitants in the world, the set of stars in the Milky Way.

A set is countable if its elements can be put in one-to-one correspondence with with the natural numbers:

(63)N={1,2,3,}

E.g. The set of natural numbers, the set of odd numbers (n2n+1), the set of even numbers (n2n), the set of primes, the set of rational numbers (Q)\textbf{!!}

Examples of infinite non-countable sets:

(64)R,the set of irrational numbers RQ,R2

Lecture 2 - Conditional Probability and Independence

Review from Last Class

Repeated Experiments/Product of Sample Spaces

E.g. Flip a coin twice:

(65)Ω1={H,T}
(66)Ω={(H,H),(H,T),(T,H),(T,T)}=Ω1×Ω1=Ω12

E.g. Flip a coin n times:

(67)Ω=Ω1××Ω1=Ω1nall n-tuples with elements in {H,T}

E.g. Flip a coin and then pick a month at random:

(68)Ω1={H,T},Ω2={Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec}
(69)Ω=Ω1×Ω2={(ω1,ω2),ω1Ω1,ω2Ω2}

Q: How many elements in Ω? (Answer: 24)

Product of Probability Spaces

If we have a probability function P1 defined in Ω1 and a probability P2 defined in Ω2 we can naturally define a probability P in Ω1×Ω2 as:

(70)P({ω1,ω2})=P1(ω1)P2(ω2)

E.g.

(71)P({H,Jul})=P(H)×P(Jul)=12×112

This is how we model independence.

Infinite (Countable) Sample Spaces

E.g. number of coin flips until first heads appears:

What is the right probability space for this experiment?

(72)Ω={1,2,3,}=N

Sample space has to be infinite because no guarantee experiment will terminate in a finite number of steps!

If we assume that after k flips all k-tuples are equally likely what should the probability P(k) be?

(73){k=1}={H}P(1)=12
(74){k=2}={T,H}P(2)=14
(75)
(76){k}={T,,Tk1 times,H}P(k)=12k

Does this result in a proper probability function?

For infinite sample spaces need to change additivity rule to countably additivity rule:

(77)P(k=1Ak)=k=1P(Ak)provided they are disjoint (AkAl=k,lN,kl)

Verification of P(Ω)=1:

(78)P(Ω)=P({1,2,3,})=P(1)+P(2)+P(3)+=k=112k=1

(Used that for a geometric series: 1+r+r2+=k=0rk=11r if 0r<1)

By extension of the rule for finite sample spaces, the probability defined above is a proper probability function.

Conditional Probability

Example: You roll a fair 6-faced die. Let A be the event that the outcome is an odd number, A={1,3,5}. Let B be the event that the outcome is less than 4, B={1,2,3}. What is the probability of A? What is the probability of A given B?

(79)P(A)=|A||S|=|{1,3,5}||S|=36=12
(80)P(A|B)=|AB||B|=23

We can write:

(81)P(A|B)=|AB||B|=|AB||S||B||S|=P(AB)P(B)

Definition

If A and B are events, and P(B)>0 the conditional probability of A given B is defined as:

(82)P(A|B)=P(AB)P(B)

Example:

(83)(42)(522)=4×352×51
(84)351

Bayes Rule

From the definition we get the properties:

Multiplication rule:

(85)P(AB)=P(A|B)P(B)

Bayes rule:

(86)P(B|A)=P(A|B)P(B)P(A)(if P(A)>0)

###

Example: There are approximately 2.6 physicians per 1,000 people in the US (from world public health data by country)

Probability of choosing a physician if randomly choose a US inhabitant = 2.61000=0.0026

(87)P(Physician|Woman)=P(Woman|Physician)P(Physician)P(Woman)=2.61000×0.360.508=1.61000

Some Special Cases

Conditional Probability is a Probability Function

For fixed C with P(C)>0 the conditional probability PC()=P(|C) is a probability function:

  1. PC(A)=P(A|C)0

  2. PC(Ω)=P(Ω|C)=1

  3. PC(AB)=PC(A)+PC(B) if AB=

Law of Total Probability

If the sample space can be partitioned as Ω=i=1nAi, with A1,,An disjoint, then:

(88)P(B)=i=1nP(B|Ai)P(Ai)

(holds even for a countable partition)

In particular, for any event A, the sample space can be partitioned as Ω=AAc:

(89)P(B)=P(B|A)P(A)+P(B|Ac)P(Ac)

Law oftotal probability example

The probability of infection from a certain virus upon exposure is 10% for children age < 13, 5% for ages 13-60, and 15% for ages 60+. What is the probability that a random individual is infected upon exposure in a population where P(Age<13)=0.2, P(13Age60)=0.6, P(Age>60)=0.2?

Let I denote the event of infection:

(90)P(I)=P(I|Age<13)P(Age<13)+P(I|13Age60)P(13Age60)+P(I|Age>60)P(Age>60)
(91)=0.1×0.2+0.05×0.6+0.15×0.2=0.08

Medical Testing Example

A diagnostic test has 99% sensitivity and 98% specificity.

If the population prevalence of the disease is 3%, what is the probability that an individual who tests positive is affected with the disease?

 

Monty Hall Problem

You're on a game show, and you're given the choice of three doors:

You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat.

He then says to you, "Do you want to switch to door No. 2 or keep prize behind door No. 1?"

Should you switch? Answer: Yes! Switching gives you 2/3 probability of winning

Screenshot 2025-11-28 at 12.13.51 AM

Independence

Two events A and B are independent iff (if and only if):

(92)P(AB)=P(A)P(B)

E.g. fair coin tossed twice:

(93)P(HH)=P(HT)=P(TH)=P(TT)=14
(94)P({H in first toss})=P(HH)+P(HT)=14+14=12
(95)P({H in second toss})=P(HH)+P(TH)=14+14=12
(96)P({H in first toss}{H in second toss})=P({H in first toss})P({H in second toss})

The events {H in first toss} and {H in second toss} are independent.

(in fact any first toss outcome is independent of any second toss outcome)

Independence between A and B is equivalent to:

  1. P(A|B)=P(A)

  2. P(B|A)=P(B)

  3. A and Bc are independent (or Ac and B are independent, or Ac and Bc are independent)

Independence of Multiple Events

A1,,An are independent iff:

(97)P(A1A2An)=P(A1)P(A2)P(An)

And also the equation above holds replacing any number of the Ais by their complements (2n equations!)

Pairwise independence does not imply independence!!

Example: Two tosses of a fair coin

These are pairwise independent but not mutually independent

Lecture 3 - Discrete Random Variables

Review from Last Class

Q: Is disjoint the same as independent? No! Disjoint events cannot both occur, while independent events don't affect each other's probabilities.

Discrete Random Variables

A discrete random variable is a function X:ΩR that takes a finite or countable number of values x1,x2,

E.g. The number in the upper face of the rolled die, the sum of two dice

Notation:

Probability Mass Function (pmf)

The pmf of a discrete random variable taking values x1,x2, is the function:

(98)p:R[0,1]
(99)p(x)=P(X=x)(Sometimes also denoted pX(x) or fX(x))

If X takes on the values x1,x2, then:

E.g. Fair coin flip:

(100)X={1Heads0Tails
(101)p(1)=12,p(0)=12,p(0.5)=p(π)=0

Cumulative Distribution Function (cdf)

The cdf of a discrete random variable taking values x1,x2, is the function:

(102)F:R[0,1](sometimes denoted FX)
(103)F(x)=P(Xx)xR

If X takes on the values x1,x2, then:

(104)F(x)=xixp(xi)

E.g. Fair coin flip:

(105)X={1Heads0Tails
(106)F(x)={0x<0120x<11x1

Both the pmf and the cdf completely characterize all the probabilistic information about a random variable (two random variables can have the same pmf and cdf and be different).

Properties of the Cumulative Distribution Function

Screenshot 2025-11-28 at 12.21.53 AM

Bernoulli Distribution

A random variable has a Bernoulli distribution

(107)fX(1)=p,(0p1)
(108)fX(0)=1p
(109)XBern(p)orXBernoulli(p)

Binomial Distribution

n independent Bernoulli trials (e.g. flipping a coin n times): X1,,XnBernoulli(p)

X=X1++Xn counts the number of successes in n trials

(110)XBin(n,p)orXBinomial(n,p)
(111)fX(k)=P(X=k)=(nk)pk(1p)nkk=0,1,,n

Example: Side Effects

Suppose it is known that 5% of adults who take a certain medication experience negative side effects. What is the probability that more than k patients in a random sample of 100 will experience negative side effects?

(112)P(X>1 patients experience side effects)=?
(113)P(X>5 patients experience side effects)=?
(114)P(X>15 patients experience side effects)=?

Binomial Distribution in R

pmf, cdf, and Random generation of a binomial random variable

Side Effects Example (continued)

(115)P(X>1)=1P(X1)=1FX(1)

Binomial distribution pmf

Screenshot 2025-11-28 at 12.24.33 AM

Binomial distribution cdf

Screenshot 2025-11-28 at 12.25.34 AM

Simulating a binomial 3 different ways

Generating n Bernoulli trials

Generating n Bernoulli trials using rbinom()

Directly sampling from the binomial

Geometric Distribution

A discrete random variable X has a geometric distribution with parameter p, where 0<p1, if its probability mass function is given by:

(116)pX(k)=P(X=k)=(1p)k1pk=1,2,
(117)XGeo(p)orXGeometric(p)

Models the (discrete) waiting time until an event happens. E.g. number of trials till first heads.

Example: Concert Ticket

You and a friend want to go to a concert, but there's only one ticket left. The salesperson decides to toss a coin until heads appears. In each toss heads appears with probability p, where 0<p<1, independent of each of the previous tosses. If the number of tosses needed is odd, your friend is allowed to buy the ticket; otherwise you can buy it. Would you agree to this arrangement?

Geometric Memoryless Property

(118)P(X>n+k|X>k)=P(X>n)

The probability it'll take n additional trials if the first k are failures is the same as the probability it'll take n trials at the beginning of the experiment.

Geometric Distribution in R

pmf, cdf, and Random generation of a binomial random variable

Warning: The definition of the geometric in R is the number of failures before the first success, i.e. X1.

Geometric distribution pmf

Screenshot 2025-11-28 at 12.33.25 AM

Geometric distribution cdf

Screenshot 2025-11-28 at 12.33.09 AM

Lecture 4 - Continuous Random Variables

Review from Last Class

Continuous Random Variables

A continuous random variable is a function X:ΩR that takes on uncountable infinite values and such that for ab:

(119)P(aXb)=abfX(x)dx

for some function fX(x)0, xR and +fX(t)dt=1

fX:RR is called the probability density function (or density function) of the random variable X.

Screenshot 2025-11-28 at 12.35.57 AM

Probability Density Function

Properties of the pdf:

Fundamental Theorem of Calculus

Part I: Let f be a continuous real-valued function defined on a closed interval [a,b]. Let F be the function defined, for all x in [a,b], by:

(120)F(x)=axf(t)dt

Then F is uniformly continuous on [a,b] and differentiable on the open interval (a,b), and:

(121)F(x)=f(x)

for all x in (a,b).

Part II: Let f be a real-valued function on a closed interval [a,b] and F antiderivative of f in (a,b):

(122)F(x)=f(x)

If f is (Riemann) integrable on [a,b] then:

(123)abf(x)dx=F(b)F(a)

Example

Let a continuous random variable X be given that takes values in [0,1], and whose distribution function is given by:

(124)F(x)={0if x<02x2x4if 0x11if x1

Screenshot 2025-11-28 at 12.37.47 AM

  1. Compute P(14X34)

  2. What is the probability density function of X?

  3. Compute P(14X12 or X>34)

Uniform Distribution, XU[a,b]

(125)f(x)={1baif x[a,b]0otherwise
(126)F(x)={0if x<axabaif ax<b1if xb

Screenshot 2025-11-28 at 12.38.22 AM

Exponential Distribution, XExp(λ)

(127)f(x)={λeλxif x00if x<0
(128)F(x)={1eλxif x00if x<0

Screenshot 2025-11-28 at 12.38.41 AM

Normal Distribution, XN(μ,σ2)

(129)f(x)=ϕ(x)=12πσe12(xμσ)2
(130)F(x)=Φ(x)

There is no analytical formula for the cdf but it can be numerically computed.

Screenshot 2025-11-28 at 12.39.09 AM

Pareto Distribution, XPareto(xm,α)

(131)f(x)={αxmαxα+1if xxm0if x<xm
(132)F(x)={0if x<xm1(xmx)αif xxm

Screenshot 2025-11-28 at 12.39.26 AM

Quantiles

Let X be a continuous random variable and let p be a number between 0 and 1. The pth quantile or 100pth percentile of the distribution of X is the smallest number qp such that:

(133)F(qp)=P(Xqp)=p

The median of a distribution is its 50th percentile.

Screenshot 2025-11-28 at 12.39.49 AM

Example 1: Median of Exponential, XExp(λ)
(134)12=F(q0.5)=P(Xq0.5)=1eλq0.5
(135)q0.5=1λlog(12)
Example 2: Median of Uniform in [1,2][3,4]
(136)q0.5=2

Uniform Distribution in R

pmf, cdf, random generation, and quantile of uniform random variable

Exponential Distribution in R

pmf, cdf, random generation, and quantile of an exponential random variable

Normal Distribution in R

pmf, cdf, random generation, and quantile of a normal random variable

Pareto Distribution in R

pmf, cdf, random generation, and quantile of a Pareto random variable with location xm and shape 𝛼

Mixtures of Distributions

Example 1: Discrete Mixture

To get to your destination you take a taxi if there is one waiting (probability 1/3) at the stand when you arrive or walk if there is no taxi waiting. A taxi takes you exactly 5 minutes. Walking to your destination takes you exactly 35 minutes. What is the cdf of the time to your destination T?

(137)P(T=t)=P(T=t|Taxi)P(Taxi)+P(T=t|No taxi)P(No taxi)={13if t=523if t=35
(138)FT(t)={0if t<513if 5t<351if t35
Example 2: Continuous Mixture

To get to your destination you take a taxi if one is waiting (probability 1/3) at the stand when you arrive or walk if there is no taxi waiting. Walking to your destination takes you an amount of time distributed as Exp(λ1) with λ1=1/35. A taxi takes you an amount of time distributed as Exp(λ2) with λ2=1/5. What is the cdf of the time to get to your destination, T?

(139)FT(t)=P(Tt)=P(Tt|Taxi)P(Taxi)+P(Tt|No taxi)P(No taxi)
(140)=13(1et/5)+23(1et/35)
(141)fT(t)=ddtFT(t)=FT(t)=13(15et/5)+23(135et/35)
Example 3: Mixed Distribution

To get to your destination you take a taxi if one is waiting (probability 1/3) when you arrive or walk if there is no taxi. Walking to your destination takes you exactly 35 minutes. A taxi takes an amount of time distributed as Exp(λ2) with λ2=1/5. What is the cdf of the time to your destination T?

(142)FT(t)=P(Tt)=P(Tt|Taxi)P(Taxi)+P(Tt|No taxi)P(No taxi)
(143)P(Tt|No taxi)={0if t<351if t35
(144)P(Tt|Taxi)={0if t<01et/5if t0
(145)FT(t)={0if t<023(1et/5)if 0t<3513+23(1et/5)if t35

Discrete, Continuous, and Mixed Random Variables

Lecture 5 – Expectation, variance, and transformations of RVs

Last class

For a continuous random variable, it follows from the definition of pdf and the fundamental theorem of calculus that for ab:

(146)P(a<Xb)=abfX(t)dt=FX(x)|ab=F(b)F(a)

But P(a<Xb)=F(b)F(a) is true for any random variable (discrete, continuous or mixed): Let A={Xa},B={Xb}. Clearly, AB and BA={a<Xb}.

(147)P(a<Xb)=P(BA)=P(B)P(A)=P(Xb)P(Xa)=F(b)F(a)

For a continuous RV (but not for a discrete or mixed) this also equals P(aXb),P(a<X<b),

Expectation for discrete RVs

The expected value is a weighted average of the values a random variable takes, weighted by the probability of taking those values. It's the center of 'gravity' where the distribution 'balances'.

For a Discrete random variable X, fX(x) the probability mass function of X, and {x1,x2,} is the support of X.

(148)E[X]=xisupp(X)xifX(xi)

The support of a discrete random variable X is the set of points that X takes with non-zero probability:

(149)supp(X)={xi:P(X=xi)>0}

Expectation discrete examples: Bernoulli

Bernoulli trial, XBernoulli(p)

(150)X={1p01p

fX(0)=p

fX(1)=1p(0p1)

(151)E[X]=0×(1p)+1×p=p

Expectation discrete examples: Binomial

XBinomial(n,p); models the number of successes in n trials

(152)fX(k)=(nk)pk(1p)nkk=0,1,,n
(153)E[X]=k=0nk(nk)pk(1p)nk=k=1nk(nk)pk(1p)nk==k=1nkn!(nk)!k!pk(1p)nk=npk=1n(n1)!(nk)!(k1)!pk1(1p)nk==npk=1n(n1k1)pk1(1p)nk=npj=0n1(n1j)pj(1p)n1j==np(p+(1p))n1=np

Expectation discrete examples: Poisson

A discrete random variable X is said to have a Poisson(λ) distribution with parameter λ>0 if

(154)P(X=k)=eλλkk!

for k=0,1,2,

 

Used to model number of events happening in a period of time e.g. number of mutations per unit length in a DNA strand, number of new patients (incidence rates), number of phone calls/particles arriving in a system, etc.

Screenshot 2025-11-28 at 12.58.37 AM

Expectation examples: Poisson (contd)

(155)E[X]=k=0kfX(k)=k=0keλλkk!=k=1keλλkk!==λeλk=1λk1(k1)!=λeλj=0λjj!==λeλeλ=λ

Expectation for continuous RVs

For a Continuous random variable X, with fX(x) the probability density function of X, the expectation is defined as:

(156)E[X]=+xfX(x)dx

NOTE: Expectation may not exist E.g. Cauchy distribution

(157)f(x)=1π(1+x2),<x<+

Expectation examples: continuous with finite support

XF(x)

CDF:

(158)F(x)={0if x<02x2x4if 0x11if x1

PDF:

(159)f(x)=F(x)={4x4x3if 0x10otherwise

Expectation Calculation:

(160)E[X]=+xf(x)dx=01x(4x4x3)dx=4(x33x55)|01=815

Expectation examples: continuous with infinite support

XExp[λ], f(x)=λeλx for x0

(161)E[X]=+xf(x)dx=0+xλeλxdx=1λ0+tetdt

(Using t=λx,dt=λdx)

(162)E[X]=1λ0+tetdt=1λ(tet|0+0+etdt)=1λ(et)|0+=1λ

(integrating by parts)

Expectation for a mixed random variable

For a mixed random variable X with cdf F(x)=pF1(x)+(1p)F2(x) with F1 continuous and F2 discrete:

(163)E[X]=p+xf1(x)dx+(1p)xixif2(xi)

where f1(x) is the density for the continuous component, f1(x)=F1(x), and f2(x) is the mass function for the discrete component.

Probabilities as expectations

For a set AΩ the random variable:

(164)IA(ω)={1ωA0ωA

is called the indicator function of the set A.

What is the distribution of IA?

IABernoulli(p),p=P(A)E[IA]=P(A)

Allows us to work with random variables (indicator functions) instead of sets and expectations instead of probabilities

Exercise: If A,BΩ, what are IAIB,min(IA,IB),max(IA,IB)?

Transformations of random variables

Example: XF(x)

(165)F(x)={0if x<02x2x4if 0x11if x1

What is the cdf of Y=X+1 ?

First, 0X12X+11

Transformations of random variables

For 2y1

(166)FY(y)=P(Yy)==P(X+1y)==P(yX+1)==P(y21X)==1P(X<y21)=1P(Xy21)==1FX(y21)=12(y21)2+(y21)4

Final CDF:

(167)FY(y)={0if y<112(y21)2+(y21)4if 2y11if y>2

Change-of-variable formula for the expectation

Continuous X:

(168)E[g(X)]=+g(x)fX(x)dx

Discrete X:

(169)E[g(X)]=xig(xi)fX(xi)

Allows us to compute the expectation of Y=g(X) without deriving the pdf or pmf of g(X)!!

Change-of-variable formula example

In the previous example computing E[Y]=21yfY(y)dy seems to require knowing/deriving fY(y)=FY(y) and integrating using the pdf of Y (a big mess)

The change-of-variable formula allows us to use a shortcut:

(170)E[g(X)]=+g(x)fX(x)dx

Only require us to integrate using the pdf of X!

(171)E[X+1]=01(x+1)(2x2x4)dx

Using Mathematica: Integrate[-Sqrt[(1+x)](2x^2-x^4), {x, 0, 1}]

(172)E[X+1]=4693(103240)

Variance

(173)Var[X]=E[(XE[X])2]=E[X2]E[X]2

Var[X] is the average squared deviation from the mean. Measure of dispersion/concentration.

Continuous X:

(174)E[X2]=+x2fX(x)dx

Discrete X:

(175)E[X2]=xixi2fX(xi)

Week 6 – Random vectors and independence

Last class

Expectation

Variance

Change of variable formula/LOTUS

Random vectors

We are typically interested in not one, but multiple related random variables defined on the same space.

Discrete random vectors

A random vector (X,Y) is discrete if it takes a finite or countable number of values RX,Y={(x1,y1),(x2,y2),}

Joint probability mass function:

In general, if CR2,

X and Y are discrete random variables (X,Y) is a discrete random vector

Continuous random vectors

A random vector is continuous if it has a joint probability density function:

(X,Y) continuous as a random vector X and Y continuous as individual random variables

Converse is not true: X and Y are continuous (X,Y) continuous as a random vector (it may not have a density)

Random vectors

The cumulative distribution function (cdf) is defined for both discrete and continuous (and mixture) random vectors:

(176)FXY(x,y)=P(Xx,Yy)={xyfX,Y(s,t)dsdtContinuousxixyjyPX,Y(xi,yj)Discrete

Just like for random variables, the pdf (continuous), the pmf (discrete), or the cdf (both), completely characterize probabilistically a random vector

For a continuous random vector:

(177)fX,Y(x,y)=2xyFX,Y(x,y)

The distribution (pdf, pmf or cdf) of the component random variables X and Y are called the marginal distribution of X and Y respectively

Marginal distributions

(X,Y) a random vector

(178)FX(x)=limy+FX,Y(x,y)FY(y)=limx+FX,Y(x,y)

For a continuous random vector:

(179)fX(x)=+fX,Y(x,y)dyfY(y)=+fX,Y(x,y)dx

For a discrete random vector:

(180)PX(xi)=yjPX,Y(xi,yj)PY(yj)=yjPX,Y(xi,yj)

Example: discrete random vector

Let M and S be the minimum and the sum of two independent rolls of fair 3-faced die.

Determine:

Example: continuous random vector

Suppose that the joint cumulative distribution function of (X,Y) is given by:

(181)FX,Y(x,y)={1e2xey+e(2x+y)if x>0,y>00Otherwise

Screenshot 2025-11-28 at 1.09.26 AM

Example: continuous random vector

(182)FX,Y(x,y)={1e2xey+e(2X+y)if x>0,y>00Otherwise
  1. Determine the joint probability density function of X and Y.

  2. Determine the marginal cumulative distribution functions of X and Y.

  3. Determine the marginal probability density functions of X and Y.

  4. Find out whether X and Y are independent.

  5. Determine Cov(X,Y) and ρ(X,Y)

Example: continuous random vector

Suppose that the joint probability density function of X and Y is given:

(183)fX,Y(x,y)={x+cy2if 0x1,0y10Otherwise
  1. Find the constant c.

  2. Determine the joint cumulative distribution functions of (X,Y).

  3. Determine the marginal probability density functions of X and Y.

  4. Find out whether X and Y are independent.

  5. Determine \text{Cov}(X,Y) and ρ(X,Y)

Independence of random variables

X and Y are independent if for any A,BR, {XA} and {YB} are independent events:

(184)P(XA,YB)=P(XA)P(YB)
(185)FX,Y(x,y)=FX(x)FY(y)
(186)fX,Y(x,y)=fX(x)fY(y)
(187)PX,Y(x,y)=PX(x)PY(y)

Propagation of independence

NOTE this is important and not covered in the book

Examples:

  1. X1,X2,X3,X4,Y1,Y2 independent Z1=sin(X12)+eX2X35+1,Z2=cos(Y1)Y23 are independent

  2. X1,X2,,XniidBernoulli(p), Y1,Y2,,YmiidBernoulli(q) (iid stands for independent identically distributed)

Let X=X1+X2++Xn,Y=Y1+Y2++Ym.

Then XBinomial(n,p) and YBinomial(m,q) and X,Y are independent

Expectation of a random vector

(188)E[(X,Y)]=(E[X],E[Y])

Interpretation is analog to that for random variables, 'center' of the two-dimensional distribution, center of mass if we think of probability as mass distributed on the surface of the plane R2

In general,

(189)E[X]=(E[X1],,E[Xn])

Example: XExp(λ1),YExp(λ2)

(190)E[(X,Y)]=(E[X],E[Y])=(1λ1,1λ2)

Multi-dimensional LOTUS

X=(X1,,Xn) a random vector, g:RnR a function, then:

(191)E[g(X)]=E[g(X1,,Xn)]={Rng(x1,,xn)fX(x1,,xn)dx1dxnif X continuousxig(xi)fX(xi)if X discrete

Consequences:

(192)E[a1X1+a2X2++anXn]=a1E[X1]+a2E[X2]++anE[Xn]

Example of linearity: XBinomial(n,p)

Then, X=X1++Xn where X1,X2,XnBernoulli(p)

(193)E[X]=E[X1]++E[Xn]=p++pn times=np

Covariance

For arbitrary random variables X and Y:

(194)Var[X+Y]=Var[X]+Var[Y]+2E[(XE[X])(YE[Y])]

The term:

(195)Cov(X,Y)=E[(XE[X])(YE[Y])]=E[XY]E[X]E[Y]

is called the covariance of X and Y.

It measures how much X and Y co-vary, i.e. vary together.

Covariance

Example of additivity of variance for uncorrelated RVs

XBinomial(n,p)

Then, X=X1++Xn where X1,X2,XniidBernoulli(p)

(196)Var[X]=Var[X1]++Var[Xn]+i<j2Cov(Xi,Xj)==p(1p)++p(1p)n times=np(1p)

Properties of covariance

(197)Cov(aX+bY,cU+dV)=acCov(X,U)+adCov(X,V)+bcCov(Y,U)+bdCov(Y,V)
(198)Cov(X,X)=Var[X](Var[aX]=a2Var[X])
(199)|E[XY]|E[X2]E[Y2]
(200)|Cov(X,Y)|Var[X]Var[Y]=σXσY

(σX=Var[X],σY=Var[Y] are called the standard deviations of X and Y respectively)

(201)1Cov(X,Y)Var[X]Var[Y]1

Correlation

(202)ρ(X,Y)=Cov(X,Y)Var[X]Var[Y]is the correlation between X and Y
(203)ρ(aX+b,cY+d)=ρ(X,Y)